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Abstract — We consider two problems related to polar codes. 
First is the problem of polar codes construction and analysis of 
their performance without Monte-Carlo method. The formulas 
proposed are the same as those in [Mori-Tanaka], yet we believe 
that our approach is original and has clear advantages. The re- 
sulting computational procedure is presented in a fast algorithm 
form which can be easily implemented on a computer. Secondly, 
we present an original method of construction of concatenated 
codes based on polar codes. We give an algorithm for construction 
of such codes and present numerical experiments showing sig- 
nificant performance improvement with respect to original polar 
codes proposed by Ankan. We use the term concatenated code 
not in its classical sense (e.g. [Forney]). However we believe that 
our usage is quite appropriate for the exploited construction. 
Further, we solve the optimization problem of choosing codes 
minimizing the block error of the whole concatenated code under 
the constraint of its fixed rate. 

I. Introduction 

Research related to construction of coding systems whose 
performance is close to Shannon limit while encoding and 
decoding algorithms are of low complexity, goes for more than 
60 years. 

A significant modern example of such system is linear 
codes with low-density parity checks (LDPC). Usually these 
are binary linear block codes with sparse parity-check matrix. 
Decoding is performed via iterative algorithms whose conver- 
gence is described by quite a few theoretical results and in 
general, these algorithms are quite good. In practice, LDPC 
codes have good performance for high noise levels. There 
exist experimentally constructed codes approaching Shannon 
limit very closely (e.g. J6)). However for high bandwidth 
region, short LDPC codes exhibit so-called "error floor", i.e. 
significant slowdown in decrease of decoding error probabil- 
ity corresponding to channel improvement, occurring due to 
decoding algorithm features. 

Polar codes were invented by E. Ankan in 2008. They are 
the first coding system possessing, on the theorem level, the 
convergence to Shannon limit for code length N — > oo, as well 
as fast encoding/decoding algorithms with complexity bound 
0(N \og 2 N). Thus polar codes are a significant theoretical 
result. 

On the other hand, the performance of polar codes in their 
initial form presented by Ankan, is considerably inferior, for a 



fixed code length, to other coding systems. To date there exist 
many proposals for improvement of polar codes performance 
(e.g. ifTTI . Ifl2l ). yet work in this direction seems to be very 
promising. 

In this paper we consider two problems related to polar 
codes. First is the problem of polar codes construction and 
analysis of their performance for various types of binary 
symmetric channel without Monte Carlo method. The formulas 
proposed are the same as those in iflOl . yet we believe that 
our approach is original and has clear advantages. Moreover, 
the resulting computational procedure is presented in a fast 
algorithm form which can be easily implemented on a com- 
puter. Secondly, we present an original method of construction 
of concatenated codes based on polar codes. We give an 
algorithm for construction of such codes and present numerical 
experiments showing significant performance improvement 
with respect to original polar codes proposed by Ankan. It 
should be noted that we use the term concatenated code not in 
its classical sense (e.g. Q). However we believe that our usage 
is quite appropriate for the exploited construction. Our idea 
is simple. It is known that approaching the Shannon limit is 
possible only with sufficiently large code length. Increasing the 
code length however makes the problem of code construction 
with large minimum distance and efficient ML decoder very 
hard. The situation is different for low-noise channel. Here 
codes of moderate length are sufficient so that ML decoder 
complexity is not too large. In order to obtain those low- 
noise channels we employ the polarization effect observed by 
E. Ankan in polar codes. Further, we solve the optimization 
problem of choosing codes minimizing the block error of the 
whole concatenated code under the constraint of its fixed rate. 

Unfortunately, we do not have a theorem on asymptotic 
optimality of our approach or just on its clear advantage with 
respect to known approaches, like e.g. flTI . Yet the simplicity 
of our approach, its flexibility and further possibilities of its 
improvement make it hopefully interesting. 

Other examples of concatenated and generalized concate- 
nated codes based on polar codes can be found in e.g. IfTTI . 
d. 

A word on the channels considered here. We assume that 
the channel is defined by input alphabet X, output alphabet y 



and transition function 

W{y\x): yx*4[0,l] 

defined for any pair x £ X, y G y. The function W(y \ x) 
defines the probability (or its density) that symbol y is received 
under the condition that symbol x was sent. For the sake of 
simplicity and in order to avoid generalized distributions we 
restrict our discussion to finite output alphabet. All formulas 
can be easily transplanted to the case of continuous channel 
by replacing the probabilities by the probability densities and 
replacing some sums by integrals. Note also that most fre- 
quently used channel models can be approximated by discrete 
ones. Moreover, data transmission systems used in practice 
represent output symbols with some fixed accuracy which is 
equivalent to some discrete channel model. 

Besides, we consider only symmetric channels with binary 
input 0]. In such channels, the input alphabet contains two 
symbols: 

# = {0,1} = GF(2), 

Output alphabet y is a subset of real numbers, and the function 
W(y | x) possesses the following symmetry, 

W(y\0) = W(-y\l). 

The rest of this paper contains 5 sections. In section [TT] 
we consider the problem of obtaining the optimal statistical 
estimate of a bit variable restricted by a linear system. Results 
of this auxiliary section are well known and belong to factor 
graph theory. These results are used in obtaining relations 
which determine the probability of erroneous bit decoding for 



from original one proposed by E. Ankan. It is based on explicit 
representation of factor graph of polar code, its interpretation 
as a set of trees and application of density evolution method. 
Note also that for polar codes we consider two types of factor 
graphs: encoder graph and decoder graph. In section IV we 



describe the polar code construction method in the form of 
fast algorithms taking on input discrete probability function 
defined by the channel. Presented also is the analysis of 
obtained codes and numerical simulation for polar codes of 
different length. 

In section [V] we discuss the possibility of polar code 
construction using polarization kernels other than G 2 which 



was introduced in Ankan's paper. Finally, in section VI 
we introduce a class of concatenated codes based on polar 
codes and present numerical comparison of concatenated and 
classical polar codes performance. 

II. Problem of bit variable estimation 

Before proceeding directly to polar codes, consider the 
problem of estimation of one random bit entering as a variable 
in a linear system. To this end we investigate two simpler 
problems: estimation of sum of two random bits transmitted 
through the channels and estimation of one random bit for 
which we have several independent sources of information. 
Actually, this section contains short presentation of factor 



graph theory which is widely used in the modern coding theory 

ID. 

A. Estimation of sum of bits 

Let the values of two independent random bits x\ and 
x 2 taking the values and 1 equiprobably, were transmitted 
through channels W\ and W 2 , respectively, which resulted in 
received symbols y = [2/1,2/2]- Using the channel model we 
compute the logarithmic likelihood ratios (LLRs) 

Pr {yt \Xi = l} Wi[yi\l) 
Assume the following quantity is required 

Pr {y I x x © x 2 = 0} 



1,2. 



L(xx © X2) = ln/(xi © x 2 ) = In 



Vi{y\xx®x 2 = 1}' 



ML decoder in section III Our derivation essentially differs Hence 



i.e. estimate the sum of two bits provided L(xx) and L(x 2 ) 
are known. Considering two possible equiprobable cases, we 
get 

Pr|i/|a;iffia;2 = 0| = i Pr jy | x x = 0, x 2 = Clj 
+ ^Pr jy I xx = l,x 2 = lj. 
Since the bits yx and y 2 are transmitted independently, 
Pr jy J Xx = 0, x 2 = j = Pr (yx | X\ — j Pr |y 2 | x 2 = 
Pr jt/ 1 xx = l,x 2 = lj = Prjyi \xx = l|Pr|j/ 2 
Hence 

Pr{y\xx®x 2 = 0| = 



x 2 = 1 



-Pr{7/i|zi = 0}Pr{j/ 2 |z2=0 
- Pr {2/1 j xx = lj Pr |y 2 \x 2 = l 



In a similar way we get 

Pr jt/ j xx © x 2 = lj = ^Pr|yi|a;i 



= 



2/i xx 



j Pr jy 2 I x 2 = 1 
l}Pr{j/ 2 |z2 =0 



Inserting the last two formulas in likelihood ratio l(xx © x 2 ) 
and cancelling the factor we obtain 

l(xx © x 2 ) = 

Pr{ yi \xx=0} Prfaa | x 2 = 0} + Prfa | xj. = 1} Pr{y 2 \x 2 = l} 
Pr{yi I xx = 0} Pr{2/ 2 | x 2 = 1} + Pr{ yi \xx = l} Pr{y 2 I x 2 = 0} ' 



Divide the numerator and the denominator by Pr{j/i | xx = 1} 
Pr{2/2 I x 2 = 1}: 



l{x\ © x 2 ) 



Pr{j/i |zi=o} Pr{yi\x 2 =o} ^ 
Pr{j/i|xi=l} Pr{ yi |x 2 = l} 
Pr { Vl I x 1= o} Pr { Vl I x 2 =0} 
Prjai I 2:1 = 1} Pr{j/i \x 2 =l} 



Using the likelihood ratios l(xi) and l(x 2 ), rewrite the last 
formula as follows, 



l(xi © x 2 ) = 



Z(xi)Z(x 2 ) + 1 
l(x 1 ) + l(x 2 ) ' 



or using logarithms, 



L(xi © x 2 ) = In 



e L(xi)+L(x 2 ) _|_ -y 



g-L(xi) _j_ gL(:E2) 

„ , _! / , /£(xi) 
2tanh 1 tanh ' v ' 



tanh 



L{x 2 ) 



2 J V 2 
For convenience introduce the binary operation 

■b- 

anh ( - ] tanh ( 

Now, 



an6=2tanh 1 ftanh (^j tanh (^j 



L{x\ © x 2 ) = L(xi) □ L(x 2 ). 

Note some useful properties of the □ operation: 

a □ b = baa, Va, 6 G E 

a □ (6 □ c) = (a □ 6) □ c, Va, 6, c € E, 

a □ = 0, VoeK, 

(-a)nfe = -(an 6), Va,6eE, 

a □ +oo = a, Va e E, 

an— oo = —a, Va e E, 

|onfe| < min(|a|,|6|), Va,6eE 

sgn(a □ b) = sgn a • sgn b, Va, 6 € E. 

We now extend the problem to three bits xi, x 2 , x 3 . Let these 
quantities be transmitted via channels W\,W 2 ,W 3 , respec- 
tively, and symbols y = [yi,y 2 , y 3 ] be received. Assume the 
following quantity is required 

L(x 1 ®x 2 ®x 3 ) = \nl(x 1 ©x 2 ffix 3 ) 

, Pr{y\xi(Bx 2 ®x 3 = 0} 

= m ? — ! t ■ 

Pr{y\x 1 ®x 2 ®x 3 = 1} 

Introduce new variable t taking values and 1 equiprobably: 

t = xi® x 2 . 

We can assume that t was transmitted via channel with the 
following transition function, 

W(yiy 2 1 1) = Pr { yi y 2 \ xi © x 2 = t|, 

and write its LLR value as 

Pr{yiy 2 1 a?i ©x 2 = 0} 



= In 



Pr|j/iJ/2 |xi ©x 2 = l| 
L(x! © x 2 ) = L(x 1 ) □ i(x 2 ). 



Then 



i(xiffix 2 ffix 3 ) = L(t 8 x 3 ) = L{t) a L(x 3 ) 
= (L(x 1 )nL(x 2 ))aL(x 3 ). 



Since the □ operation is associative, drop the parentheses: 

L(xi ® x 2 © x 3 ) = L{xi) a L(x 2 ) □ L(x 3 ). 

Using induction, we obtain formula for arbitrary number of 
variables: 

L(xi ffli 2 ffl..Ji„) = L(xi) □ L(x 2 ) a... a L(x n ). 

We now proceed to estimation of bit transmitted independently 
via several channels. 

B. Estimation of bit transmitted several times 

Let random bit x taking values and 1 equiprobably be 
transmitted via n different channels W\ , . . . , W n , receiving 
symbols y = [y\, . . . , y n ]. One can compute LLRs relying 
only on one channel: 

T , x , , f , , PT{ yi \x = 0} Wi( yi \0) 

Li{x) = \nk{x) = In =-4— ry = In : 

Pr{y 4 |x = l| W l {y i \l) 

We need to estimate x, that is 

Pr{w|x = 0) 

L(x) = lnJ(x) = ln— ^ -{. 

Pr I x = 1) 

Since channels transmit symbols independently, 

n 

Pr {y I x = a} = J^J Pr { t/j | x = a} , a = 0, 1. 
Inserting the last formula in expression for l(x), we have 



i = l,n. 



or taking logarithms, 



i=l 

Obtained formula gives the estimate of a bit for which we 
have several independent sources of information. 

C. Estimation of a bit entering a linear system 

Let random vector variable x = [xi,x 2 , . . . ,x„] T whose 
components take values and 1 equiprobably, satisfy the linear 
system 

Ax = 0, 

where the matrix A E GF(2) mx ™ is exactly known. Assume 
that the quantities x 1; ...,x„ are transmitted via channels 
W\, . . . , W n , and received symbols are y = [y\, . . . , y n ]. Then 
initial LLRs are known 



A, = In 



Wi( yi \0) 



W l (y t \iy 

Assume the following LLR is required 

Pr{y|x! = 0} 



1, n. 



L(xi) = In 
without knowledge of x. 



Pr{y|xi = 1} 



Note that if some component a;, is exactly known, we can 
assume that it is transmitted via binary symmetric channel with 
zero error probability, and 



A, = 



+00, Xi = 0, 
—00, Xs = 1. 



Vice versa, if some bit Xi is not transmitted, we can assume 
that it is transmitted via absolutely noisy channel with the 
transition function 

W(0|a) = l, a = 0,1. 
It is easy to see that in this case 

Xi = 0. 

If such bit enters only one equation, we can remove this 
bit and respective equation. If some equation contains only 
exactly known bits (withA; = ±00), this equation also can be 
removed. 

Associate matrix A with a bipartite undirected graph by 
the following rule. Each matrix row (i.e. each equation) is 
associated with a square vertex. Each matrix column (i.e. each 
bit variable) is associated with a round vertex. A round vertex 
and a square vertex are connected by an edge if respective 
matrix row and matrix column intersect at value one (i.e. if the 
respective variable enters respective equation). Such a graph 
is referred to as Tanner graph for the matrix A. 

We focus on just one bit variable, say x\. If the Tanner 
graph is disconnected, remove all connected components save 
one containing the vertex X\. Now if removing some bit x p 
results in emerging of q graph components A\, A 2 , . . . , A q , 
our problem of estimating x\ is split into q smaller problems. 
Let Yj, be a subvector of y containing only those components 
which arise in transmission of bits entering the subgraph Aj. 
Assume p / 1, and let X\ £ A\. We can assume that x p 
is transmitted via q — 1 different channels with transition 
functions 

W t {Y t \a) = Pr{Y l \x p = a y % = %q. 
Compute LLRs of x p considering only channel i, 



Li{x p ) = In 



Wi(Yi\0) 



Wi(Yi\l) 

Then the subgraphs A 2 ,A 3 ,...,A q may be removed with 
updating the initial X p estimate to 



Ap — Ap 



1 

■E 



Li(xp). 



Note that for each i the problem of computation of Li(x p ) is 
also a bit estimation problem formulated on a smaller graph 
Ai augmented by the vertex x p . 

Now assume that the Tanner graph is a tree, i.e. it is con- 
nected and acyclic. Assume also that every equation contains 
at least two variables. Choose the vertex x\ as a tree root 
vertex. The leaf vertices will be some subset of X2, ■ ■ ■ ,x n . 



Let the vertex X\ be incident to equations c\, c 2 , . . . , c q , 
and let each vertex Cj, i = l,g be incident to variables 
x ki 1 , • ■ ■ Xki v . , not counting x\. Let Tf be the subtree with 
root at x^ j> not counting the root itself (see fig. [TJ. 

Let If C {1, 2, . . . , n} be an index set for variables entering 
the subgraph T,- . For all i,j define the set 

Y? ={y r : r ei?}U{y fcj _.}, 

i.e. the set of all symbols obtained via transmitting variables 
entering the subtree rooted at Xk t ■ Assume that for each pair 
i.j we know the LLRs 



In 



?*{Yi\x Kj =0} 
Pr{Y?\x kiJ =1}' 



(1) 



i.e. bit x^ j estimates based only on tree rooted at the vertex 
x^ j ■ This can be interpreted as transmitting each such bit via 
the channel with the following transition function, 

W! j (Yi\a)=Pr{Yj\x kiJ =a}. 



Write each equation Cj in the following way: 

x l = x ki 1 © x ki 2 ® •■■ © x k itV . j i = 1? <Z- 



(2) 



We can assume that X\ was transmitted via independent 
channels Wi with transition functions 



P r {Yl,Y l 2 ,...,Y l l >\x 1 =a}, 



l,q. 



Then LLRs based on these channels have the form 



T ( x , Pr{y i 1 ,y; 2 ,..., ^10:1 = 0} 

LAx\ — In ; j r- 

Pv{Y>,Y^...,YtW = l) 

Inserting we get 

Pr {Y?,Y?, . . . , Yh \x klA © x ki 2 © ... © x h v . = 0} 



Li(xi) = In 



Pv{Yl,Y^...,Yt\x kiA ®x ki>2 



x k t , v . =1} 



Taking into account ([T} and using result of section |II-A| we 
obtain 

Li(xi) = L(x ki l © x ki 2 © . . . © x ki v . ) 

= L(x ki l ) □ L(xki_ 2 ) □ ■ ■ ■ □ L(x kl ^.)- 




Fig. 1. Tree-like Tanner graph given in the rooted tree form 



Finally assuming X\ be transmitted via the channels 
Wi, W2, • ■ ■ , W q and also via W\, write 



where 



L(x[i]) = In 



W(y l I 0) 



i = 0,1. 



L(a?i) = Ai+^ii(xi). 



In order to compute L(xk t ), we can apply the same reasoning 
to the subtree rooted at x^ i . Thus we have a recursive 
algorithm computing L(x\). It is essentially equivalent to the 
algorithm known as "belief propagation". 

III. Polar codes 

In this section we consider polar codes in exactly that form 
which they were presented in originally [8|, but take a slightly 
different look. Let u° and u 1 be two independent random bits 
taking values and 1 equiprobably. Define two more bits 



x[0] = 
x[l] = 



(3) 



In matrix notation, 

[i[0], *[!]] = 



W{t 1 1) ' 

Now assume that the value of u° is exactly known and that 
we need 

Pr {3,1^=0, u } 

L(u ) = m ) (r. 

y ' ¥v{y\u 1 = 1, u } 

Using again section |H] we get 

Liu 1 ) = L(x[1])+(L(x[0])dL(u )) = L{x[l}) + (-l) u °L(x[0}). 

We proceed to recursive construction of the larger system and 
then to similar problem of determining of one bit. 

A. Hierarchical graph construction 

Double the encoder graph taking two copies of each variable 
and of each equation. Now let u°, u 1 , u 2 , u 3 be the transmitted 
random bits while u° [0] , [1] , u\ [0] , u\ [1] are their functions: 

u° ©it 1 , 



[ii°V 



] ■ G2, G2 



1 
1 1 



Note that bits x[0], x[l] also take the values and 1 equiprob- 
ably. Construct the Tanner graph for the system ([3]) and denote 
it as encoder graph (see fig. [2}. 



«S[o 
«i[o 

u\[l 

From the other hand, 

..0 



u 2 © u 3 , 



(4) 




Fig. 2. The encoder graph 

Since G^ 1 = G2, we can rewrite the system ^ in 
equivalent form 

u° = x[0]®x[l], 
u 1 = x[l\. 

Construct the Tanner graph for this system also and denote it 
as decoder graph (see fig. [3J. 




Fig. 3. The decoder graph 

Let bits x[Q] and x[l] be transmitted via a given channel W 
receiving symbols y = [y°, y 1 ]. Assume the following LLR is 
required 

Pr{y|u° = 0} 
Pr{y|u° = 1}' 

Using results of the section [II] we have 

L(u°) = L{x[0])oL(x[l]), 



M ?[0]ffi«?[l], 
u 1 - «2[1], 

= «i[o] ffii*}[i], 

w 3 = u\[l]. 

For consistency, set Uq[0] = u l . Figure |4] gives the graph for 
the system Q. 

ug[0] 
uj[0] 

«§[0] 



Fig. 4. Graph for the system of equations |4j 

Introduce four new variables (fig. B), 

«°[o] = «S[o]©«l[o]. 




«i[0], 

ttS[i]ffit4[i], 



L(u°) = In 



u°[3] - «i[l] 
In matrix notation, 

[«§[0], «§[!]] = 



(5) 



[«5[2],«§[3]] 



u?[0], Wl [0]]-G 2) 
u?[l],^[l]]-G 2 . 




Fig. 5. Graph for the system of equations (3) 



Again employ the relation G 2 1 = G2 and express old 
variables in terms of new ones (fig. [6j, 



u°[0]©u°[l], 



«?[0] 

«i[o] = «§[i], 

u?[l] = u§[2]eu§[3], 

«i[l] = «§[3], 



(6) 



u° 2 [0] 




U 3 Q [0] 



Fig. 6. Graph for the system of equations J6j 



We call the graph displayed on fig. [5] the encoder graph 
and the graph displayed on fig. [6] the decoder graph. Now 
we are able to repeat the whole operation, double the graph 
and introduce eight new variables U^[i], i = 0, 7, and proceed 
further. We can express new variables in terms of old ones, 



_x [2j],4 +1 [2j + l]] = 



Go 



(7) 



and vice versa: 



[um,uf +i \j]] 



[4 +1 [2j],4 +1 [2j + l]]-G 2 . 



Assume we make n steps and stop at introducing new variables 
u^[i]. Indices in the expression u\[j] have the following 
interpretation based on decoder graph. Lower index k specifies 
the vertical "layer" of the graph of 2" variables where the 
given vertex is, if we count layers right to left. Bracketed index 
j specifies the independent group of variables in a layer. Upper 
index i specifies the variable inside a group. 

Increasing layer number by one doubles the number of 
independent groups J k , i.e. 



Jfe — 2Jfc_i, 




Fig. 7. Decoder graph for n = 4. 



Layer indexed contains one group of 2™ variables, i.e. Jo = 
1. Hence 

Jk = 2 fc . 

Since every layer contains 2™ variables, the number of vari- 
ables in every group of layer k is 



S k = — = 2'' 

•Jk 



i— k 



Thus, upper index in the expression u\[j] has range to 5*^ — 1, 
while bracketed index has range to — 1. Figure [7] shows 
decoder graph for n = 4. 

B. Problem of bit estimation 

Let the variables constituting the graph last layer, be 

transmitted via some channel W and received as symbols y = 
[yo, 2/1, ■ ■ ■ , y,j n -i]. Using the channel model, we have 

W(yj I 0) 



A; 



hi 



W(y s \iy 



Assume that for some m we exactly know the quantities 

Assume that the estimate of the next bit w™[0] is required, i.e. 

Pr |y I u™ [Oj = 1, Uq[0\, 1 = 0,m — lj 

Denote the subvector of y consisting of contiguous bits from 
index j ■ S k up to index {j + 1) • Sk — 1 inclusively by symbol 
Yfc[j]. Then the vector y has the following representation in 
terms of its parts Y k [j], 



1, n. 



y 



[Y k [0],Y k [l],...,Y k [J k ]]. 



On the decoder graph, the subvector Y k [j] may be interpreted 
as components of y corresponding to those bits u^[l] which 
are strictly on the left of variables of group j, layer k. For 
example, if n — 4 (fig. [JJ then group 1 of layer 2 consists of 
variables [1] , u\ [1] , u| [1] , u\ [1] , while the subvector Y2 [1] is 
obtained via transmission of the bits u®[4], 1*4 [5], 1*4 [6], u®[7]. 
Introduce the following notation, 

Pr {Y k [j} I u\[j\ = 0, u l k [j],l = Oj^l} 



P^{Yk[j}\ul[j} = l,u[[j},l = 0,i-l} 



(9) 

This means that L(u k [j]) is the LLR for the bit ul\j] provided 
that Yfc[j] is received and that the quantities 

ui[j]y k \ji...y k 7 l [j] 

are exactly known. Note that the formula ([8]l is a special case 
of |9) for k = 0, and that for k — n the formula |9]) takes the 
form 



Pr{^|<b1=l} 
Our goal is to obtain recursive formula for L(u k [j]) in terms 
of L^u^Al]). If we have it, we can compute the required 



L(uq 1 [0]) using (lOi as a recursion base. 



Denote the subgraph consisting of single vertex by 
A,j[j]. By induction, let A% [j] for j — 0, Jk ~ 1 be the union 
of subgraphs Afc +1 [2j], A k+ i[2j + 1], of all vertices of group 
j, layer k and of incident equations. On the graph drawing 
we can interpret Afc+ifj] as a subgraph whose vertices are all 
bits of group j, layer k and all vertices on the left of these. 
For example, if n = 4, the subgraph ^[l] consists of the 
variables 

«2[4], «2[5], u2[6], u°M 



u° 3 [2], ul[2], u° 3 [3], 4 [3], 
«![!], ui[l], «i[l] 



and incident equations (fig. [TV Note that the subgraph 
contains those and only those variables of layer n, whose 
transmission results in the vector [j] . 

C. Recursive formulas 

Here we find the expression for L(u k [j]) under the con- 
straint k < n. All variables of layer enter only one equation, 
and for k > we do not have any immediate information 
for them, thus we remove them and the incident equation. 
Now for k > 1 the same can be done for layer 1 etc. Finally 
we retain only layers with index at least k. The graph will 
be divided in J k connected components A k [l]. Remove all 
components save one containing ui{j], which means that we 
keep only the component Denote q — \i/2\. If bits 

u^lj], ui [j], . . . are exactly known, then according to 

the equation (171, exactly known are also the quantities 



-i[2j], 



L k+1 



[2j + l], i = 0,g-l. 



to 



(11) 

y k l \j] and 

1 1 1 may be removed: these equations contain only known 



Hence the equations incident to u k [j] , u k [j] , 



quantities. After the removal the vertices 

Uk\j},ul\j],...,u 2 k q - 1 \j] 



A k+ i[2j] 

f 

u k+l 


n 0- 




A k+ i[2j + l] 





Fig. 8. The decoder graph after vertex removal 



i[2;]o 




k+ 



ul +1 [2j + l 



Fig. 9. The graph after substitution of vertices «fc_|_ 1 [2j], u k _i_i[2j + 1] 
instead of subgraphs A^^i [2j], A k+ i[2j + 1], respectively. 



become isolated and also may be removed. The vertices 



2q+2 



are not transmitted and do not have any estimates, thus 
also may be removed with corresponding equations (one per 
vertex). After these transformations the graph will have the 
form depicted on fig. [8] 

Since removing the vertex u 9 k+1 [2j] divides the graph into 
two components one of which is Afe + i[2j], we can assume 
that the bit u q k+1 [2j] is transmitted via the channel with the 
following transition function 



W (y fc+ i[2j]|o) 



Pr{n +1 [2i]|«« +1 py] = « 

ui+xpy + *],* = 0^=1}. 



(12) 

Also we can substitute the subgraph A&_|_i[2j] by the single 
vertex u 9 k+1 [2j], with the following initial likelihood ratio, 



W (Y k+1 [2j}\0) 



W Q (Y k+1 [2j]\l) 
Comparing (|9]l with ( [12) 1, we conclude that 

Xl +1 [2j]=L(ul +1 [2j}). 

Similarly, the vertex u k+1 [2j + 1] can be substituted for the 
whole subgraph Ak+i[2j + 1], if we set 

Xl +1 [2j + l]=L(ul +1 [2j + l}). 

Resulting graph is displayed on fig. [9] We know already that 
for such graph, 

L{uf\j)) = L(u q k+1 [2j]) □ L(u q k+1 [2j + 1]), 

L{u 2q+1 [j\) = L(ul +1 [2j + l}) + (-l)<^L(ul +1 [2j}). 

(13) 

Thus we have obtained the recursive formulas giving 



L(uq[0]) for all i = 0, 2 n — 1 using ( 10 1 as a recursion base 



D. Successive cancellation method 
Let the vector 



it 



[«g[o], «5[o], «g[o], . . . , ^"^[o]] 



be the message required for transmission. Starting from u we 
can compute 

x = [<[0],<[1],<[2],...,<[2»-1]], 

using the formulas (FT). The vector x will be considered as a 
codeword and transmitted componentwise via given symmetric 
channel W producing vector y at the receiver. We want to 
recover u using y. We do it sequentially bit by bit. First we 
compute L(w[J[0]) and estimate the bit Uq[0] as follows, 

f 0, L(ug[0]) > 0, 

u° [0] = I 1, L(u° o [0}) < 0, 

[choose randomly, L(uq[0]) = 0. 

Now assuming Uq[0] exactly known, we compute L(uj[0]), 
estimate Mq[0] etc. Each bit is estimated using the rule 

r o, L(ui[o}) > o, 

KM = I 1, L(ui[0}) < 0, (14) 

[choose randomly, L(uq[0]) = 0. 

Finally we produce some estimate of the initial message u. 
This decoding method is called successive cancellation. Of 
course, presented coding system is useless since the redun- 
dancy is missing. 

E. Polar codes 

Choose some index set F C {0,1,2,..., 2™ — 1}. Denote 
K = 2 n — \F\. Make a convention that the only possible 
messages are those with bits from F equal to zero. We call 
these bits frozen, and other ones information bits. Again we 
use the successive cancellation however with modified bit 
estimation rule: 



[0] 



ft), i £ F, 

[choose using ( 14 1, otherwise 



Since the admissible messages form the linear space of di- 
mension K and codewords x depend linearly on u, the set 
of all codewords is also a linear space. In other words, we 
have linear block code of length 2™ and of rate K/2 n . It can 
be shown that its generator is obtained by deleting rows with 
indices in F from the matrix 

where Gf n denotes the ?i-th Kronecker power of the matrix 
G2 and R n the bit reverse permutation matrix. The rule 
describing this permutation is as follows: let binary represen- 
tation of index i be a rl _ia„_2 • • • a2a±ao, then the element i 
is swapped with the element indexed a§ct\ct2 ■ ■ ■ o>-n-2 a n-X' 
in binary representation. Thus constructed code is called the 
polar code. 



How to choose the set F of frozen bits? Denote the 
probability of erroneous detection of the bit w [0] using the 
successive cancellation method provided that all previous bits 
are detected correctly and F — 0, by Ei, i = 0, 2" — 1. The 
probability of block error Pe with F fixed may be estimated 
from above as a sum of probability errors for each information 
bit, i.e. 

The set F may contain indices of bits with maximal error 



probabilities, which will minimize the upper bound (15i of 
the block error probability. To this end, one has to compute 



the probabilities Ei, which is discussed in section IV 



F. Complexity of encoding and decoding 

Polar codes would not have any practical value without fast 
algorithms of encoding and decoding. The encoding process 
is carried out by recursive formulas (j7]) 

[4+i[2i],4+i[2j + 1]] = [uftiUf^m ■ g-2 

and requires n sequential steps fc = 1,2,3, ... ,n. On each 
of these steps, all variables of layer k are defined. Since each 
layer contains 2" bits, the overall encoding complexity is 0(n- 
2") operations, which is 0(N ■ log 2 N) if we introduce the 
code length N = 2". 

Decoding by successive cancellation method using the re- 
cursive formulas ( fT~3^ > requires computation of n ■ 2™ differ- 
ent quantities L(u l k [j}) and of n ■ 2" quantities u\[j] in a 
more complex order. Hence the decoding complexity also is 
0(N ■ log2N) operations. 

IV. Construction and analysis of polar codes 

Construction of polar code of given length N = 2 n and 
rate ^ for a given channel W amounts to choosing the set F 
of N — K frozen bits. The choice which minimizes the block 
error probability Pe would be optimal. However computation 
of Pe is complicated and it is reasonable to substitute its upper 
bound in the minimization problem, 

miny^ E u 
F t-^i 

i£F 

where Ei is the probability of erroneous detection of bit i 
by successive cancellation under assumption that all previous 
bits are detected without error. In this formulation, it is 
sufficient to choose N — K indices corresponding to maximal 
values of Ei as the set F provided that Ei are known for 
i = 0, 2™ — 1. Thus the polar code construction problem is 
reduced to computation of quantities Ei. 

A. Likelihood ratios as random variables 

Since the channel is symmetric and the code is linear, in 
computation of Ei we can assume that all-zeros codeword is 
transmitted. In this case the probability to receive the vector 
V is 



JV-l 



p(y) = II ^(wlo)- 



(16) 



i=0 



By definition of E^ we assume all bits Uq, u\,..., zero. 
In this case the recursion ( p~3] > takes the form 

L{u\«\j}) = L(ul +1 [2j])DL(ul +1 [2j + l}), 

L{u\ q+l \j\) = L(ul +1 [2j + l])+L(ul +1 [2j}). (17) 

The recursion base ( [T0| remains unchanged: 



L«[j]) = A J =ln 



Now the quantities L(u k [j]) depend only on y and do not de- 
pend on u = [u[j[0], . . . ,m^ _1 [0]], thus we consider L(u l k [j]) 
as random variables defined on probability space y N with 
probability measure ( [To} . 

According to ( |16} , the quantities Vo,y\, ■ ■ ■ ,Vn-i 
are mutually independent. Hence the quantities 
L«[0]),i«[l]),...,i«[iV - 1]) are also mutually 
independent, because every quantity L(u^[j]) depends on 
only one symbol yj. The quantities L(u k [j']) and L{u l k [j"]) 
are independent for all k > 0, i', i" and j' ^ j", because they 
are defined by recursive formulas ( 17 1 via non-intersecting 
sets of £«[?'])■ 

Following the hard decision rule ( fT4] i we see that the bit i 
is detected erroneously in all cases when L(uq[0]) < and in 
half of cases when L(uq[0]) = 0. In other words, Ei is^j 

Ei = Pr {L(ul[0}) < 0} + \ Pr{L(4[0]) = 0}. 

Extend the problem of computation of E{ to computation of 
distributions of random variables L(uq[0}). Denote by fl\j] 
the probability function^] of the random variable L(u k [j}): 

r k \j](z)=Pr{L(ui\j]) = z}. 



From the channel model we have /° [j] , j = 0, N — 1: 



PH In 



W( yj \l 

E 



b:ln - 



We see that the distributions of random variables 



L(un[j]) are the same for all j. Formulas (17i imply that 
for k < n the distributions of L(u k [j]) also do not depend on 
j, i.e. f k \j'} = f k [j"] Vz, k,j',j". Therefore in what follows, 
we drop the square brackets in the notation ft\j}. 

B. Recurrent relations for the distributions 

Here we show that the distributions ft satisfy the recurrent 



relations analogous to the formulas ( 17 1. We start from random 
variables with odd indices: 



L{ul +1 [2j 



l]) + L{ul +1 [2j\) 



'if L(itj,[0]) has continuous distribution, the term | Pr{L(-uj,[0]) = 0} 
should be deleted. 

2 In the continuous case, fl[j] will be the probability density function. 



Since L(u k+1 [2j}) and L(u k+1 [2j + 1]) are i.i.d., 
fl« + \z) = Pr{L(^ +1 b1) = 4 

fk+ 



E n+Mft + ^ 



a,bGsupp/^ 1 : a+b—z, 

where supp/j? +1 = {v : f k+ i(v) ^ 0}. Rewrite the sum so 
that it will go over one index only: 

ft q+ \z)= E A 9 +i(«)/ fe \i(^«)- (18) 

a£supp/^ +1 



If supp/' +1 is a uniform mesh, the formula (18i is nothing 
else but discrete convolution of a sequence witriitself. In the 
continuous case, the formula (fTSll will have the form 



/f + (*) = j f q k+ Mfl + i( z - a ) d ^ 

— oo 

which is also convolution of the function f k+1 with itself. 
Hence it is quite natural to call the probability function h 
defined by the formula 

h(z)= E f( a )9{z-a), 

a G supp / 

the convolution of the functions / and g with the notation 
h = f*g. 

Thus in new notation 

fl q+1 = / fc 9 +1 * 4V d9) 

Random variables with even indices are treated analogously. 
Using the corresponding recursive formula 

L(u^\j]) = L(ul +1 [2j})nL{ul +1 [2j]) 

we write 

f?(z) = Pr{L(uf[j]) = z} 

E fl + Mf k+1 {b). 

a,b £ supp ffc+i '■ a\3b— z, 

Introduce the notation 

fm 9 (z)= E f( a ^(b) 

a £ supp /, b £ supp g: adb—z, 

and rewrite the recursion as 

f k \z) = fl +l ®f k+1 . 



(20) 



While the operation H is not a convolution in the usual sense, 
we still will use this term. Now we have the recurrent formulas 



( 19 1 and ( |20| i which give the required distributions of random 
variables L(uq[0]) if the initial probability function 



E 



fc:lu 



is used as the recursion base. 



The probability error E^ is obtained from the probability 
function ffr. 



Bk = 



E 



atEsupp /q: a<0 

In what follows we consider a simple case when the con- 
volutions * and H of two functions are reduced to simple 
operations on pairs of numbers. 

C. The case of binary erasure channel (BEC) 

The problem of polar codes construction for BEC was 
solved in that very article where the polar codes were in- 
troduced, however in a different formulation |8|. The BEC 



scheme is shown on fig. 10 




Fig. 10. Schematic view of binary erasure channel 

Binary erasure channel is a binary input symmetric channel, 
possibly simplest one in terms of decoding: if the received 
symbol is 1 or —1, the transmitted symbol is unconditionally 
known. Initial probability function has the support of two 
values, 

{1 — p, z = +oo, 
P, z = 0, 
0, otherwise. 

The quantity p E [0, 1] is termed the erasure probability. 
Consider the convolution of functions B p and B r for some 
p,r £ [0, 1]. The sum a + b of all possible pairs (a, b), a E 
supp B p ,b g supp B r has the form 

+ = 0, 

+ (+oo) = +oo, 

(+oo)+0 = +oo, 

(+oo) + (+oo) = +oo. 



only with a = 0, b = 0, 

(B p *B r )(0) =pr, 



Now we turn to the convolution B p H B r . Again consider all 
possible pairs (a, b) and the output of a □ b: 

0n0 = 0, 

0n(+oo) = 0, 

(+oo)n0 = 0, 

(+oo ) □ (+oo) = +oo. 

Again supp(S p E3 B r ) = {0, +oo}. Since the value of +oo is 
obtained only with a = +oo, b = +oo, 

(B p ®B r )(+oo) = (l- p )(l-r). 

Therefore 

{B p H B r )(0) = 1 - (1 - p)(l -r)=p + r-pr 

and 

Bp H B r = Bp+ r _ pr . 



Thus the formulas ( fT9| ) and ( p0| > take the form 



2q+l 
Pk 

2q 
Pk 

fl 



B„ 



The recursion base will be p^ = p, the erasure probability of 
the channel. It only remains to note that 

1 



Ei = -pl, i = 0,2»-l. 

The case of general symmetric channel is considered in the 
next subsection. 

D. The general case 

In the general case the complexity of exact computation of 
convolutions becomes too high since the support cardinality 
(and memory requirements) for the probability function grows 
exponentially with the code length. Approximation of the 
probability functions using a uniform grid is quite natural. 
Denote by 6 the grid step and by Qj the grid cell number i 
with i = —Q, Q: 



therefore 



and 





tt = 


K 


0' 




= 


iS- 


2<^ + 


is obtained 


^ = 


iS- 


S 

r l6 + 




V Q = 


Q6- 






fi-Q = 


— oo 


, -Q5 + 



2 = 1,0—1, 



(Bp* B r )(+oo) = l-pr 



1,-1, 



S 
2 

where is a positive integer which will be called the number 
of quantization levels. Thus the grid consists of 2Q + 1 cells. 
The points iS will be called the grid nodes. All cells except 
for extreme ones have grid nodes as centers. 



Algorithm 1 Computation of the projection / of g * h with 
arguments supported at grid nodes 



f^g-kh 
f(x) ^0 Vi 

for i < Q, . . . , Q do 

f(iS) <- f(iS) 
end for 

for % <- Q + l r ..,2Q do 

/(q<j) <- /(Qi) + /(»*) 

/(-Q«5) <- /(-Q$) + /(-i*) 
end for 



Algorithm 2 Computation of the projection / of g H h with 
arguments supported at grid nodes 



f(x) f-0 Vi 

for i < Q, . . . , Q do 

for j < Q, . . . , Q do 

fe <— nearest(i<5 □ j<5) 
f(kS)^f(kS)+g(iS)h(jS) 
end for 
end for 



Define the grid projection operator. Let / be some proba- 
bility function. For each grid node, sum all nonzero values of 
/ which belong to the corresponding grid celQ 



E /to- 



pi) 



z £ Oiflsupp / 



Let g and ft, be two functions supported at grid nodes. The 
convolutions g * h and g 53 ft can take nonzero values outside 



the set of grid nodes. Use the projection (21 1 to restrict the 
resulting function to the grid. 

Note that the convolution g * h can have nonzeros only at 
points iS with i from —2Q up to 2Q. Among these points, 
only those with |z| > Q are not grid nodes. If i > Q, these 
points belong to the rightmost cell Q,q, and if z < — Q, to 
the leftmost cell Q_q. Thus the projection operation for the 
convolution result consists in summing the values outside the 
interval [— QS, QS\. The convolution g\S h, on the contrary, is 
not supported outside the interval [— QS, Q5] because |on6| < 
min(|a|, |6|). 

Denote by nearest (x) the index of the cell flj containing 
the point x. In other words, nearest(a;) is the index of the 
grid node closest to x. Then the approximate computation 
of convolutions * and H described above corresponds to 
algorithms [T] and [2] respectively. 

Since the grid is uniform, the convolution / g * h 
from the first step of the algorithm [T] may be computed in 
C(Qlog 2 Q) operations using FFT The rest of the algorithm 
takes only 0(Q) operations, hence the overall complexity of 
the algorithm fTl is 0(<31og 2 Q) operations. The complexity of 
the algorithm]^ is 0(Q 2 ), which is much worse. 

3 if / is the probability density function, let f(iS) = f n f{u>)du> 



Convolutions * and ffl arise also in the problem of opti- 
mizing the weight distributions for rows and columns of the 
LDPC check matrix. Results from this area may be used for 
the design of fast version of the algorithm [2] namely the 
algorithm from [1|. It is based on the following inequalities 
for the quantity k appearing in the line 4 of the algorithm [2] 



ln2 

IT 



< \k\ <min(|i|,|j|). 



Also, sgnfc = sgni • sgnj, i.e. the quantity sgni • sgnj • 
min(|i|, \j\) estimates k with an error not exceeding M(8) = 
~h~\' Th^ s observation helps to reduce the complexity of 
the algorithm to 0(Q-M(S)) operations. However taking finer 
grid makes M(§) larger, and the speedup smaller. However the 
speedup is noticeable. Let A = SQ be the rightmost grid node 
and [—A, A] the segment containing all grid nodes. Typical 
values used in our numerical experiments were A = 60, Q = 
2 13 . In this case 5 = ± « 0, 007324 and M{8) = 95. 

Thus making the grid projection of the initial probability 
function /° and substituting approximations for the exact 
computations which use formulas ( [T9] > and p0| we obtain a 
numerical method for computation of probability errors E^. 
While the accuracy analysis for this method remains an open 
question, our numerical experiments show that good accuracy 
can be achieved without refining the grid too much. 

E. Performance analysis 

Construction procedure described above implies that the 
polar code is built for a concrete channel. In practice, channel 
properties may change with time, therefore it is important to 
analyze the performance of the constructed code for channel 
models with different noise levels. For most modern coding 
systems, in particular for low-density parity check codes, the 
only available tool is the Monte-Carlo simulation. 

For polar codes, such analysis is available in much less 
expensive way. To obtain the upper bound for the block error 
probability, one can compute the error probabilities Ei by the 
method used in code construction and sum these quantities 
over indices of information bits. Numerical experiments show 
that this estimate is quite accurate. 

F. Numerical experiments 

It is instructive to check the quality of the estimates given 
by the described performance analysis method. To this end, 
one can compare the Monte-Carlo simulation results and the 
obtained estimate for some concrete code. Using random 
number generator, form "received" vector y satisfying the 
channel model and decode it. For large number of trials Nf, 
the decoder will make Ne errors. We can estimate the block 
error probability as follows, 



Ne 
N t ' 



According to the central limit theorem, with the probability of 
some 95% this estimate belongs to the confidence interval of 



10 
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10 



10 
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10 



Monte-Carlo I- 
Proposcd estimate 
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Channel error probability 



0,055 0,06 



Fig. 11. Performance of the polar code of length 1024, rate ^ on 
binary symmetric channel estimated by Monte-Carlo simulations and proposed 
analysis method 



the radius 



1,96 




(22) 



where a 2 is the variance of the random variable taking the 
value of 1 if the decoder makes an error and otherwise 1 1 3 1 . 
Exact value of a 2 is 

a 2 = P E {l-P E ) 

and while it is unknown, we can estimate it using the sample 
variance formula 



Nn 



N T -1 



Ne~ 



1 



N T 



Thus the Monte-Carlo method has the accuracy of the order 
7V T 2 which implies large computational costs. For Pe <C 1, 
obtaining a 50% confidence estimate requires according to 



formula \22\, some 4 • 1, 96 2 • P E trials. For example, if 



Pe = 10 ' , one will need 15 • 10 7 trials. Further, with 
10% confidence level the number of trials increases up to 
100 • 1,96 2 • P E l . If P E = 10~ 7 , this number will be 
approximately 3, 8-10 9 . Therefore in Monte-Carlo simulations 
we restrict the noise level to interval corresponding to Pe 
exceeding 10~ 5 . 

For this experiment we constructed a polar code of length 
1024 and rate I for binary symmetric channel with error 
probability 0.06. Monte-Carlo simulation was run for binary 
symmetric channels with various error probabilities. Also, an 
estimate of block error probability was computed using the 
proposed analysis method. Obtained graphs are shown on 



fig. 11 One can see that the results produced independently 
in two different ways are very close. 

Consider now a different channel model, an AWGN channel 
with binary input and additive normal noise. Output alphabet 
for this channel is the real axis, while the transition function 
has the form 

' expf- <»-''- 2 *»" 
V^ra 2 V 2(T 2 



W(y\x) 



e{o,i}. 
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Fig. 12. Performance of the polar code of length 1024, rate | on AWGN 
channel estimated by Monte-Carlo simulations and proposed analysis method 
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Fig. 13. Performance of polar codes of rate | and different lengths for 
binary symmetric channel 



In other words, transmission over such channel amounts to 
mapping input bits and 1 to symbols 1 and —1, respectively, 
and adding afterwards normal noise with zero average and 
variance a 2 . Note that this channel has continuous output 
alphabet and does not fit to previous sections theory. However 
a similar numerical experiment is perfectly possible for a 
discrete approximation of this channel. Instead of a 2 , on the 
horizontal axis we plot the signal/noise ratio in decibels 



SNR (dB) = 10 log 



1 



10 a 2- 



We constructed a polar code of length 1024 and rate ^ for 
the noise level 3dB. Monte-Carlo simulation was run for 
various noise levels. Also, an estimate was computed using the 
proposed analysis method. The results are shown in fog. 12 
One can see that the graphs again are almost identical. 



Fig. 13 shows performance graphs for polar codes of rate 
\ and of lengths 2 13 = 8192, 2 16 = 65536 and 2 18 = 262144 
for binary symmetric channel. For code rate h and binary sym- 
metric channel, the Shannon limit corresponds to p = 0.11. 
One can see that the convergence to Shannon limit is rather 
slow. Next section is devoted to generalization of polar codes 
which allows to increase the convergence rate. 



V. Polarization kernels 

For the definition of polar codes, the following matrix was 
used in section ITTTl 



G, = 



One can use another invertible matrix G of arbitrary order I. 
Then the hierarchical graph construction will involve taking I 
copies of encoder graph, instead of doubling. New variables 
will be expressed in terms of old ones by the formula 



MUi [y]> u l+i ft? + !],-••, '"fe+ 

and old variables in terms of new ones, by the formula 

[4yUk +1 y\,---,< +l - 1 ] = 

K+i{ij},ui +1 [ij + 1], . . . , u\ +1 [ij g-\ 



After n steps of graph construction, the code length will be 
l n and the decoder graph will consist of n layers, each having 
Jk = I groups of Sk = l n ~ k variables. The vector of output 
symbols Yj. [j] corresponding to group j of layer k, still will 
consist of contiguous bits from index j-Sk up to index (j + 1) • 
Sk — 1 inclusively. The problem of computation of quantities 



L(4\j]),L(u^ 



J\ 



using the values of 

L(ui +1 [lj]),L(<+x[lj + !])>•• -,H< +1 [lj + 1-1}) 

leads to a graph analogous to the shown in the fig. [9] this 
time isomorphic to the Tanner graph of the matrix G _1 . In 
general this problem cannot be reduced to belief propagation 
algorithm working on a tree and requires exponential in I 
number of operations. Computation of L(u l k l+m [j]) is always 
possible by enumeration of all possible events. For brevity, 
denote u m = u l k i+m \j],x m = 4+ift? + m],Y = Y k [j],x = 
[xq,Xi, . . . ,xi-i]. Then 



L(u m ) = In 



Pr{Y | u m = 0;u o ,tti,...,u m -i} 



Pr{Y\u m = l;u ,ui,...,u m -i}' 
Let X a , a = 0, 1 be the set of all vectors x such that 

[u ,ui, . . . , u m _i,a, * * *] = xG^ 1 , 

where * * * stands for I — m — 1 arbitrary bits. Then 

Pr{Y u m = a;u ,ui,...,u m -i} = 

Pr{Y | x} = 



(23) 



— r 



i 



\X„ 



HMY'lxt}, 



xGX a t—Q 

where Y* is component t of Y. Inserting the last equality for 
a = 0,1 into numerator and denominator of (|23]>, we get 



L(u r , 



In- 



Dividing the 



numerator 
1}, we get 



and denominator by 



L(u m ) = In 



where 



l{x t ) 



Pr{y*|a; t = 0} 
Pr{y*|a; t = 1} 
Using the last equality, write 

Ezexo exp(E[=S(! © x t )L{x t )) 



L(u m ) = In 



Exexx e MJ2 L tJo( l © x t )L{xt)) 



(24) 



This gives the recursive formula for computation of 
L(u l ^ +m [j]) in the case of arbitrary matrix G, however in- 
volving sums with exponential in I number of terms. 

A. Obtaining the recursive formulas 



For some polarization kernels, the formulas ( 24 1 may be 
replaced by simpler relations containing familiar operations + 
and n. For example, consider the matrix 



G, = 





1 
1 



Note that G, 1 = G3 and draw the Tanner graph analo gous to 



the shown in the fig. [9] but corresponding to G 3 1 (fig. 



14) 



We now find the expression for The vertices 



3?+l 



[.?] 



3g+2 



[j] may be removed from the graph together 



with incident equations. We obtain the graph shown in the 
fig. 15 Using results of section [TTJ one can write 

L(v%>\j]) = L(ul +1 [3j])nL(ul +1 [3j + l])DL(ul +1 [3j + 2}). 

In order to find L{u^ +1 {j]), the quantity should 
be considered known exactly. We have to remove the vertex 



fiq+2 r 



[j] from the graph in fig. 14 and the incident equation 



(fig. [To) . From the last graph we conclude that 

L(ul« +1 \j]) = L(ul +1 [3j + l}) + 

(-l) u l"ti\L(u q k+1 [3j}) aL(u q k+1 [3j + 2])). 



*k+i 



l k+i 



l k+i 



[3/ + 1] 
[3; + 2] 




o«?[/] 



Ou 



ftf+1 



O u 



3(7+2 



Fig. 14. Graph for the recurrent relations for the matrix G3 



l k+l 



[3; + l] 




«Z +1 [3;+2] 



Fig. 15. Graph from fig. |14| after removal of excessive vertices 



l k+l 



k+i 
'i 



[3; + 1] 




Fig. 16. Graph from fig. |14| after removal of excessive vertex 

Similarly we can write the third formula: 

L(ul« +2 \j])=L(ul +1 [3j + 2}) + 

In an analogous way, one can try to obtain recurrent 
formulas for an arbitrary polarization kernel. If the Tanner 
graph for some fixed index is not tree-like, one can try to 
amend it by adding some equations to other ones. Further, if 
a cycle contains an exactly known bit, the cycle can be broken 
by doubling the respective vertex. Unfortunately, starting from 
I = 5 the tree-like graph can be obtained only for some of 
polarization kernels. 

Those kernels which admit simple formulas also admit 
simple code construction and analysis. For instance, in the ex- 
ample considered above the recurrent formulas for probability 
functions have the form 



f 3q 
Jk 

f 3g+l 

Jk 

f 3q+2 
Jk 



/* + l*(/* + lB/ fc \l) 

fk+1 * fk+1- 



For polarization kernels which do not admit simple recurrent 
formulas, the problem of code construction is open. In general, 



the computation of probability functions f k 



via /. 



fc+i 



is a 



multidimensional summation problem. Possible solutions are 
Monte-Carlo method and approximations by normal distribu- 
tion. 

VI. Concatenated polar codes 

In this section we consider a method of performance 
improvement for polar codes in which short classic error 
correcting codes are used together with polar codes. 

Let Ci, Ca, . . . , C q be a set of linear codes of equal length 
M. Let Ki be the number of information bits in the code Cj. 
Let V be some M x N matrix each of whose elements is 
or 1. Denote by vji with < i < N and < j < M the 
elements of V, by xP its row j and by vi its column i. For 
all i = 0, N — 1 choose some integer dj in the range 1 to 
q. We consider only such matrices V whose columns Vi are 
codewords of C a , i.e. 



ViEC ai , i = 0,N-l. 



(25) 



Consider an arbitrary polar code of length N and rate 1, i.e. 
without redundancy, with matrix generator G <G G¥(2) Ny - N . 
Encode each row of V with this polar code obtaining a new 
matrix X e GF(2) MxN : 



X = VG. 



(26) 



If the matrix X is "reshaped" into a row, one can consider 



the set of all such possible rows subject to restriction (25 1 as 
a linear code of length M ■ N and rate 



K 



1 



N-l 

MX ~ MX 53 

i=0 



K„ 



Thus obtained linear code we will call the concatenated polar 
code. Let Y be the matrix received after the transmission of X 
through the channel and let y J be its row j. The decoder works 
by applying alternatively the steps of successive cancellation 
method for rows of Y and maximum likelihood decoder for 
its columns. 

In order to decode the column vq, compute for each row of 
Y independently the logarithmic likelihood ratios 



L(v jt o) = In 



Pr{y»'Ko=0} 



PT {y j \ v j,o 



just like in the usual successive cancellation method. Then 
the values L(vj o), j — 0, M — 1 gathered in a vector y are 
given as input to ML-decoder for the code C a<) . The most 
likely codeword w € C ao produced on output is taken as an 
estimate of Vq. 

Next we compute the estimate of V\. Assuming vq already 
known, again compute for each row independently the LLRs 



L ( V 3,l) 



V 3,l 



= 0; 



;,o> 



concatenate the values L(vj i) into a vector y, which will 
be the input of ML-decoder for the code C ai , The obtained 
codeword is taken as an estimate of v\. Next, assuming Vq 
and v% exactly known, compute the estimate of V2 etc. 

Note that the polar codes are a special case of concatenated 
polar codes for M = 1, q = 2 and C x = {0}, C 2 = {0, 1}. In 
this case, bit i is frozen if a, = 1, and it is information bit, if 
di = 2. 

A. Code construction and analysis 

Let Ei be the error probability for estimation of column i 
under the constraint that all previous columns were estimated 
error-free. Write again the upper bound for block error prob- 
ability: 



AT-l 

p e <J2 e * 

i=0 



(27) 



l C 



Fix some symmetric channel W, set of codes Ci, 
of length M, polar code of length N and rate 1. We require 
to construct a concatenated polar code of given rate k/N, i.e. 
choose numbers do, ffli, • • • , ajv-i such that 



N-l 



K. 



(28) 



i=0 



We will choose these numbers so as to minimize the upper 
bound (27 1. Denote by the error probability for estimation 



of the column Vi under the constraint that all previous columns 
were estimated error-free and ou = k. Note that does 



not depend on aj for all j ^ i. For a concrete choice of 
ao, ai, . . . , ajv-i we can write the following upper bound for 

Pe, 



N-l 

p e <Y. e T- 

i=Q 



(29) 



Assume for now that for all £ = 0, N — 1 and fc 
1 . (i we can compute Ef. In order to choose the optimal 
ao, ax, ■ ■ ■ , ajv-1, we will use the dynamic programming 
method. Let F(s,t),s = 0, N — 1, £ = 0, K be the minimal 
possible value of the sum 



i=0 



under the constraint 



4=0 



K ai = t, 



(30) 



or let F(s,t) — +oo, if there is no sets of a.j satisfying (30i. 
For convenience, set F(s, t) = +oo for t < 0. It is easy to 
note that 



F(s, t) = min (F{s -l,t-K l )+ E l t ) . 

Introduce the notation^] 

A(s, t) = argmin (F(s -l,t-Ki) + E l t ) . 

To make the formula ( (31) correct also for s — 0, let 

^(-1,0) = 0, 

F{-l,t) = +oo, t^0. 



(31) 



Now using (31 1 for sequential computation of F(s,i) for 
s = 0,1,2,... and all t, and saving the corresponding 
quantities A(s,t), one can compute F(N — l,K), which by 
definition is the minimal possible value of the sum ( |29| ) under 
the constraint (|28j. If F(N -l,K) = +oo, there is no set of 
_i satisfying the constraint (28 1. 



ao, ax, 



> a^v- 



Let F(N — 1, K) 7^ +oo. In order to recover the sequence 
ao, ax,... , ajv-i delivering the minimum to the sum (29 1, we 



make a pass in reverse order utilizing the saved quantities 

A(s,t), 



ajv-i 

flJV-2 

ajv-3 



a, 



A{N-2,K-K aN _ 1 ), 
A(N - 3,K — K aN1 — K a 



A(i,K- J2 K <»)> 
l=i+l 



N-l 

= A(Q,K - K a[ ). 

i=i 



4 if the minimum is attained for several values of I, any of those can serve 
as A(s,t). 



Now we get back to the problem of estimating E\. Since 
the channel is symmetric and the code is linear, we assume 
the all-zero codeword is sent. Suppose that the columns 
vq, Vx, ■ ■ ■ , Vi-x have been estimated correctly and the decoder 
is to estimate V{. Next the ML-decoder for the code Ck takes 
on input the vector 

A = [L(v 0>i ), L(vx,i), L(v 2 ,i), . . . ,L(v M -x,i)}- 

For convenience, introduce the notation \j = L(vj t i). The 
components of A are i.i.d. random variables. Their probability 
function (or pdf) fi can be computed approximately using 
the method described in section [IV] We can assume that the 
column Vi is transmitted via some symmetric channel with 
LLR distribution /j. Thus the problem of computing E!? is 
reduced to the estimation of error probability for the ML- 
decoder on a channel with given probability function /j. As 
stated in the introduction, the ML-decoder minimizes the linear 
functional 



<f>(c) 



M-l 
j=0 



where c = [cq, c\, . . . , cm-i] runs over all codewords of the 
code Ck- For the all-zero codeword the functional <fr is zero. 
Hence if the decoding error occurs, there necessarily exists 
some codeword d such that <fi(c') < 0. The last inequality can 
be rewritten as the sum of wh(c') terms, 



j £ supp c' 

Some nonzero codeword d will be strictly more preferable 
than if 4>(c') < and in this case the decoder error will 
surely occur. If <f>(d) = 0, the decoder may choose the 
correct codeword among those which zero the functional <j>. 
For simplicity assume that 4>{d) = also implies the decoder 
error. Write the probability of the event that for a fixed d the 
inequality <fi(d) < holds as 



Pr 



j (E supp d 



A,<0 



The sum consists of wn{d) i.i.d. random variables with the 
probability function /j, therefore the probability function of 
the sum is 

v v ' 

w H (c') times 

It follows that the probability of the event (j>{d) < depends 
only on the weight w of the codeword d and it can be written 

as 

n/»= E 

xGsupp f* w : x<0 

The main contribution in the error probability is made by 
codewords of minimal weight. Let dk be the code distance of 
the code Ck, and let m,k be the number of different codewords 



TABLE I 

Code distances of the codes of length 32 used for the 
construction of concatenated polar codes 



of weight dk in the code Cfc. 
estimated as 



Then the probability E* may be 



E 



mk ■ P{fi,d k ). 



(32) 
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Fig. 17. Performance of the concatenated polar code of length 1024 = 
32 X 32 and rate | on an AWGN channel estimated using Monte-Carlo 
method and using the proposed method 



Experiments show that this value is likely to overestimate the 
real error probability (computed by a Monte-Carlo simulation) 
by a constant factor which does not depend on the channel. 
For this reason in experiments reported in this paper a simple 
empiric technique was used to correct the multiplier m^. Each 
of the codes C\ , C\ , . . . , C q was simulated on an AWGN 
channel with different SNR ratios to obtain its FER curve. 
The number was chosen so that the estimate ( |32"| > fitted 
the experimental curve best. We do not have a theoretical 
justification of this procedure, however the results of numerical 
experiments show its high accuracy. 

In a similar way one can estimate the FER of a concrete 
concatenated polar code on a given channel. It is sufficient to 
approximate numerically the sum 



N-l 



Concatenated polar code 
Original polar code 




(33) 



3 3.5 4 4.5 5 
Signal / Noise Ratio, dB 

Fig. 18. Comparison of performance of usual polar code and concatenated 
polar code of length 1024 and rate ^ on an AWGN channel 



and take it as an upper bound for block error rate. 



B. Numerical experiments 

For the construction of concatenated polar codes consider a 
set of 26 different codes of length 32. Numbers of information 
bits Ki and code distances di for each code are given in the 
table U 

The first interesting question is the accuracy of the block 
error estimate (33 i which is computed approximately. In the 
fig. (17) we show the performance graph of the concatenated 
polar code of length 1024 and rate I on an AWGN channel. 
The code was constructed for the channel with SNR= 2.5dB. 
The solid curve represents the Monte-Carlo estimate, the 



dotted curve is the estimate (33 1 computed approximately. One 
can see that the curves are practically identical within the limit 
of applicability of the Monte-Carlo method. 



Concatenated polar code 
Original polar code 




Signal / Noise Ratio, dB 

Fig. 19. Comparison of performance of usual polar code and concatenated 
polar code of length 8192 and rate | on an AWGN channel 
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[14] S. B. Korada, E. Sasoglu, and R. L. Urbanke, "Polar codes: Characteriza- 
tion of exponent, bounds, and constructions," CoRR, vol. abs/0901.0536, 
2009. 



Fig. 20. Comparison of performance of usual polar code and concatenated 
polar code of length 8192 and rate | on an AWGN channel 



It is also interesting to compare the performance of polar 
code and of concatenated polar code of the same length and 



rate. Fig. 18 shows the performance graph of the concatenated 
polar code of length 1024 and rate h which was already 
presented above together with the polar code of the same 
length and rate. Both codes were constructed for an AWGN 
channel with SNR= 2.5dB. One can see that the concatenated 
code outperforms the usual one by an order of magnitude. 



The figures 19 and 20 show analogous comparative graphs 
for the codes of length 8192 and rates h and §, respectively. 
Similar to the previous example, concatenated polar codes 
also outperform the usual ones approximately by an order of 
magnitude. 
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